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'!;;j- . Abstract 



Current syntactic theory limits the range of grammatical variation so severely that the logical problem 
of grammar learning is trivial. Yet, children exhibit characteristic stages in syntactic development at 
least through their sixth year. Rather than positing maturational delays, I suggest that acquisition 
difficulties are the result of limitations in manipulating grammatical representations. I argue that the 
genesis of complex sentences reflects increasing generative capacity in the systems generating structural 
descriptions: conjoined clauses demand only a regular tree rewriting system; sentential embedding uses 
a context-free tree substitution grammar; modification requires TAG, a mildly context-sensitive system. 



I. Some current views of natural language syntax localize all cross-linguistic variation in a small set of 
finite-valued parameters. This has the effect of eliminating certain in-principle learnability problems that 
plagued earlier views of grammar. Since the range of possible hypotheses is restricted, the child will show an 
inductive bias, that is, she will at times be driven to conclusions seemingly stronger than those warranted 
by the input data. Further, learnability in the limit is guaranteed, since the number of possible grammars 
is finite. Finally, the limited amount and simplicity of information to be acquired means that grammars 
should be acquired quickly and easily. Unfortunately, this beautiful picture conflicts with what we know 
from empirical studies of language acquisition: children exhibit characteristic developmental stages in their 

Z^\ acquisition of grammar. Consequently, if such a parametric view of syntactic variation is correct, children 

must be held back in their attempts at syntactic learning by something other than the inherent difficulty of 

a the learning task. 

I suggest that children's acquisitional difficulties result not from problems of grammatical acquisition per 
se, but rather from their limited abilities in manipulating grammatical representations. In particular, I 
K^ ' argue that the sequence of certain stages in syntactic development can best be understood as a reflection 

K^ _ of ever increasing generative complexity of the underlying formal grammatical systems used by the child to 

; I ' construct her tree structure representations. 

II. It has been widely reported in empirical studies of language acquisition that different types of complex 
(i.e. multi-clausal) sentences vary with respect to the point at which children first exhibit mastery of them. 
Looking at the naturalistic production data of four children. Bloom et al. (1980) report that the productive 
use of complex sentences involving conjunction consistently precedes that of sentences involving complemen- 
tation which in turn precedes sentences involving relativization. This result is supported by experimental 
study of children's comprehension. Tavakolian (1981) demonstrates that young children exhibit difficulty in 
interpreting relative clauses. She argues that the interpretations that children do assign to such structures 
result from their (incorrectly) imposing a conjoined clause analysis. I take this tendency to prefer conjunction 
over relativization to be the same effect observed by Bloom and her colleagues. Further, Goodluck (1981), 
Hsu et al. (1985), McDaniel and Cairns (1990) among others have found that children correctly interpret 
"control constructions" in which the empty subject is within a complement clause, such as (1), at an earlier 
age than cases where the empty subject is within an adverbial clause as in (2). 

(1) a. Cookie Monster tells Grover^ [PROi to jump over the fence] 
b. Grover^ was told by Cookie Monster [PROi to jump over the fence] 

(2) Cookie Monster^ touches Grover [after PROi jumping over the fence] 

Let us suppose that a child's incorrect interpretations in (2) are the result of her inability to assign this 
sentence a structural representation appropriate according to the adult grammar. If we assume that complex 



sentences containing adverbial clause adjuncts are similar to sentences containing relative clauses in the 
relevant structural respects, i.e., they involve adjunction structures, this phenomenon can be seen as another 
instance of Bloom et al.'s sequence of complementation before relativization (now modification). 

III. Thus far, we've seen that children's acquisition of complex constructions proceeds according to the 
sequence coordination < complementation < modification. Yet, we have not provided an explanation for 
why these should be so ordered. What I will now suggest is that these stages are ordered by the ever greater 
demands of generative capacity that they impose upon the formal tree rewriting system used to construct 
phrase structure representations. 

Before proceeding with this, we will require a brief detour into defining a novel tree rewriting formalism. 
Developing a suggestion of Weir (1987) (though in a restricted fashion), let us define a schematic tree 
grammar (STG) as a 4-tuple G = {Vn,Vt,S,I) where Vn is a finite set of non-terminals, Vt is a finite 
set of terminals, S* is a distinguished non-terminal and / is a finite set of schematic initial trees. The set 
of schematic initial trees in an STG may be any finite set of finite tree structures whose frontier nodes are 
drawn from Vt U Vn and whose internal nodes are drawn from Vn- Further, all nodes of schematic initial 
trees but the root may be annotated with the superscripts -I- or *. The intuition behind the use of these 
superscripts is the same as their usage in regular expressions: each schematic tree represents an infinite set 
of trees, just as each regular expression represents an infinite set of strings. When a schematic tree contains 
a node A'' marked by -I-, the class of trees representd by this schematic tree includes trees containing 1 or 
more copies of the subtree dominated by A'', each copy attached to A^'s parent. Similarly, schematic trees 
containing nodes marked * correspond to those trees where this node appears or more times. We formalize 
this as follows: 

Definition 1 A (possibly empty) sequence of trees (ri, . . .t^) instantiates a schematic tree a iff: 

1. if the root of a is superscripted by -f, the sequence of ti's is of length > 1; 

2. if the root of a is not superscripted, the sequence of Tt 's is of length exactly 1; 

3. for each Ti, the root is labelled identically to the root of a; 

4-. for each Ti, the sequence of subtrees dominated by the children of the root of Ti, {tI,...tP), may be 
partitioned into a sequence of contiguous subsequences, {t} , . . . t^), {t^ , . . . rf ) . . . (r™, . . . t"), so that these 
subsequences successively instantiate the subtrees dominated by the children of root of a from left to right. 



Derivations in an STG G do not directly utilize the schematic trees in /, but rather manipulate the trees 
which instantiate the trees in /. The only combinatory operation we allow in an STG is substitution. 
Application of substitution is however restricted so as to prevent the generation of recursive structures. The 
set of derivable trees from a grammar G, D{G) is thus defined as follows: 

Definition 2 t is derivable by G, t E D{G), iff: 

1. T instantiates some a G /; or 

2. T is the result of substituting r' into t" where r', r" G D{G) and no non-terminal on the path from the 
root of t" to the site of subsitution appears in t' . 

We finally define the set of trees and strings which are generated by an STG G as follows: 

Definition 3 The tree set of an STG G is the set of trees T{G) — {t\t E D{G), t is rooted in S and the 
frontier of t E Vf}. 

Definition 4 The string language of an STG G is L{G) — {w\w is the frontier of some t E T{G)} 

Two things concerning this tree rewriting system are relevant for our concerns. The first is that the string 
languages generated by STGs are all and only the regular languages. The second is that, in spite of its 
relatively weak formal power, it is nonetheless sufficiently expressive to express analyses of (certain cases of) 
coordination structures. If the schematic tree below on the left is in the grammar, the STG will generate 
the string John has eaten an apple and Fred has eaten peaches and a candy bar. the tree below on the right, 
which duplicates the IP root node once and duplicates the NP object, instantiates this schematic tree. We 
need only perform the relevant substitutions into the NP nodes to complete the derivation.^ 

^In the interest of space, we put aside certain complications concerning the insertion of conjunctions, the allowance for some 
degree of lexical variation within the internal structure of coordinated phrases. Such issues can be dealt with without increase 
in generative capacity. 
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This regular tree rewriting system does not allow for the generation of syntactic structures appropriate for 
sentential complementation. There is simply no way to produce the unbounded degree of embedding that 
is necessary from a linguistic standpoint. If, however, we move to the system of tree rewriting discussed by 
Schabes (1990) called Tree Substitution Grammars (TSG), complementation can be accommodated. A TSG 
consists of a finite set of (finite) elementary trees whose leaves may be either terminals or non-terminals. 
As before, derivations consist in substituting these elementary trees rooted in some category C into non- 
terminals at nodes along the frontiers of other elementary trees also labeled C, but no restriction is imposed 
on possible recursion. Schabes observes that TSGs are strictly context-free in their weak generative capacity, 
though the tree sets they produce are somewhat richer Jn 

Suppose now that we wish to generate sentences containing modification structures, e.g. relativization 
and adverbial adjuncts. It is indeed possible to use TSG to produce linguistically natural derived structural 
representations. We may however have independent motivations for what may constitute an elementary tree 
in our grammar. In particular, we might propose (following, among others, Frank 1992) that the elementary 
trees of a tree rewriting system should contain only information concerning a single predicate, such as a verb, 
and its associated argument structure. If this is true, then there can be no representation of a modification 
structure (such as an adverbial modifier) in an elementary tree since it does not play a role in the argument 
structure of the predicate heading that tree. Consequently, TSG will not be sufficient for our purposes. 
Instead, we must turn to a somewhat more powerful system of tree rewriting which allows the operation 
of adjoining, namely Tree Adjoining Grammar (TAG - Joshi, Levy and Takahashi 1975). Adjoining allows 
us to introduce modification structures into elementary trees which previously lacked any representation of 
them. Thus, TAG provides us with a richer and more linguistically appealing class of derivations for a set 
of trees that are generable by TSGs. Note that the weak generative capacity of TAG is strictly greater 
than the context-free power of TSG, though it is nonetheless restricted to the class of so-called "mildly 
context-sensitive languages" (Joshi, Vijay-Shanker and Weir 1991). 

IV. To summarize, we have seen that there is an increase in generative complexity associated with the 
tree rewriting systems necessary for coordination, complementation and modification. I suggest that it is 
precisely this increase in complexity that gives rise to this acquisitional sequence. It is important to observe 
that we are crucially dealing with complexity measures in terms of tree rewriting systems here rather than 
the more traditional string rewriting systems. In building syntactic representations, the core problem is 
the recovery and appropriate factorization of dependencies. These problems are most naturally addressed, I 
would argue in a tree rewriting framework. Further, from the perspective of string language complexity, both 
coordination and (right branching) complementation produce regular sets. Consequently, on the basis of a 
string complexity measure, we would not (contrary to fact) predict any difference in acquisitional difficulty 
between these cases. In previous work (Frank 1992), I found that other complex sentences structures which 
are tied to the use of the adjoining operation, but which are at least a priori distinct from modification, show 
similar delayed acquisition. This provides, I claim, independent confirmation that our tree rewriting based 
approach is on the right track. 

^We point out that the addition of the device of schematic trees to TSG or to TAG does not increase the weak generative 
capacity since such node expansion can be simulated using substitution or adjoining, though strong generative capaciy is affected 
since tree structures of arbitrary arity cannot be generated without the use of such schemas. We leave open the issue of whether 
children's grammatical systems (as well as those of adults) include such schema all the way through development. 



Finally, we can ask whether this limited ability to manipulate systems of tree rewriting is tied to compu- 
tational load, following the proposals of Joshi (1990) and Rambow (1992). Certain experimental evidence 
suggests that such an approach is correct: children's grammatical difficulties can be alleviated to some degree 
if extraneous task demands are diminished (cf. Grain and Fodor 1993 for a review). Thus, if more powerful 
tree rewriting mechanisms are unavailable simply because they demand too much of the child's resources, 
they might become available when other resources are freed. 
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